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Abstract 


The SARS-CoV nucleocapsid (N) protein is a major antigen in severe acute respiratory syndrome. It binds 
to the viral RNA genome and forms the ribonucleoprotein core. The SARS-CoV N protein has also been 
suggested to be involved in other important functions in the viral life cycle. Here we show that the N protein 
consists of two non-interacting structural domains, the N-terminal RNA-binding domain (RBD) (residues 
45-181) and the C-terminal dimerization domain (residues 248-365) (DD), surrounded by flexible linkers. 
The C-terminal domain exists exclusively as a dimer in solution. The flexible linkers are intrinsically 
disordered and represent potential interaction sites with other protein and protein-RNA partners. Bioin- 
formatics reveal that other coronavirus N proteins could share the same modular organization. This study 
provides information on the domain structure partition of SARS-CoV N protein and insights into the 
differing roles of structured and disordered regions in coronavirus nucleocapsid proteins. 


Introduction syndrome (SARS), which has a case fatality rate of 


ca. 8% [4]. Sequence analysis reveals that SARS- 


Coronaviruses are the causative agents of a num- 
ber of mammalian diseases which often have 
significant economic and health-related conse- 
quences [1], 2]. Diseases such as transmissible 
gastroenteritis in pigs and avian infectious bron- 
chitis in chicken often have great impact on the 
agricultural industry of a nation [3]. In humans, 
coronaviruses are often associated with mild 
respiratory illnesses, including common cold. 
However, a novel coronavirus has been identified 
as the etiology agent of severe acute respiratory 
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CoV represents either a new coronavirus group or 
an outliner of group 2 coronaviruses [5-8]. 

The SARS CoV genome contains five major 
open reading frames that encode the replicase 
polyprotein, the spike protein (S), envelope (E), 
membrane glycoprotein (M), and the nucleocapsid 
protein (N). SARS-CoV is an enveloped virus with 
S, M and E proteins as the envelope proteins. The 
N protein binds to the viral RNA genome and 
forms the ribonucleoprotein core, which is pre- 
sumed to be helical. The M protein may also be 
involved in the formation of the nucleocapsid 
through interaction with the N protein. Upon 
infection, the N protein enters the host cell with 
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the ribonucleoprotein core and is able to interact 
with a number of host proteins [9]. The high 
abundance of the N protein makes it a major 
antigen, an attribute which has often been used in 
the development of rapid-diagnosis kits against 
SARS [10, 11]. 

The nucleocapsid protein is a 422 amino-acid 
protein, sharing only 20-30% homology with the 
N proteins of other coronaviruses [6, 7]. From 
genetic and bioinformatics studies, the N protein 
can be divided into three putative regions: an N- 
terminal domain, a RNA-binding domain (RBD) 
and a C-terminal domain [12, 13]. The N- and C- 
terminal domains are believed to play a role in 
interaction with other proteins. A number of recent 
studies have shown that part of the C-terminus in 
the N protein of SARS-CoV is involved in the 
oligomerization process of the protein [14, 15]. 
Rather surprising, the mid-portion of the protein 
has been shown to interact with the M protein and 
hnRNP AI [16, 17], and structural studies have 
identified the region between amino acids 45-181 
as the putative RNA-binding region, which 1s close 
to the N-terminus [18]. These discrepancies from 
the putative domain partition necessitate the deter- 
mination of both the functional and structural 
organization of the protein. However, the struc- 
tural organization of coronavirus N proteins in 
general remains largely unknown to this day. 

We have employed a blend of experimental 
techniques and bioinformatics analyses to define 
the structural organization of SARS-CoV N pro- 
tein. Through the power of nuclear magnetic 
resonance (NMR) spectroscopy, we present the 
first evidence that the SARS-CoV N protein 
consists of two independent structural domains. 
The first domain lies inside the putative RNA- 
binding domain identified in a previous report [18]. 
The second domain lies in the C-terminal half of 
the protein and is capable of forming dimers in 
solution. The rest of the protein is highly accessible 
to the solvent, and bioinformatics analysis predicts 
that they are intrinsically disordered. Other coro- 
navirus N proteins share similar features of SARS- 
CoV N protein at the sequence level, implying 
functional significance. The elucidation of the 
modular organization of the SARS-CoV N pro- 
tein, particularly the boundary between disordered 
and structured regions, facilitates future studies of 
this class of proteins at the functional and struc- 
tural level. 


Materials and methods 


Sequence alignment, secondary structure and order— 
disorder prediction 


The full-length sequences of SARS and other 
coronavirus N proteins were aligned using CLU- 
STALW version 1.83 with the slow algorithm, an 
identity matrix, a window of 4 amino acids and 
standard gap penalties [19]. The result was then 
edited with SeaView based on the position of the 
known structural domains of SARS-CoV N pro- 
tein. The JPred server [20] was used for secondary 
structure prediction. Order—disorder prediction 
was obtained through sequence submission to 
the PONDR server (http://www.pondr.com) using 
the predictor VSL1, which is an implementation of 
the IST-Zoran predictor [21-23]. Access to 
PONDR® was provided by Molecular Kinetics 
(Indianapolis, IN, USA). 


Plasmid construction 


We cloned fragments spanning the different or- 
dered and disordered regions of the SARS-CoV N 
protein (Figure 1B) based on PONDR information 
(Figure la) and reports in the literature [18]. 
SARS-CoV TW1 strain cDNA sequencing clones 
were kindly provided to us by Dr. P.-J. Chen of 
National Taiwan University Hospital [24]. Clones 
for SARS-CoV N protein fragments were obtained 
by polymerase chain reaction (PCR) on a 
RoboCycler Gradient 96 (Stratagene, CA) using 
appropriate primers. The resulting PCR fragments 
contained an NcoI site at one end and a BamHI site 
at the other. After restriction enzyme digestion, the 
resulting fragments were cloned into pET6H (a gift 
from Prof. J.-J. Lin, National Yang Ming Univer- 
sity, Taiwan) containing a His-tag coding region. 
Full-length SARS-CoV N protein construct was 
obtained by sequential ligation of the cloned PCR 
fragments using appropriate restriction enzyme 
sites. The sequences of all constructs were con- 
firmed by DNA sequencing. The resultant protein 
fragments all include an extra MHHHHHHAMG 
sequence at the N-terminus. 


Protein expression and purification 


For biochemical studies, the SARS-CoV N protein 
clones were expressed in Escherichia coli BL21(DE3) 
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Figure 1. (a) SARS-CoV N protein fragments studied in this paper. Designations of the fragments are listed on the left. (b) 
PONDR prediction of the order—disorder regions of SARS-CoV N protein. Hatched regions represent PONDR scores higher than 


0.5 and are considered disordered. 


strain in Luria broth media using standard pro- 
tocols. To prepare samples suitable for NMR 
studies, the cells were cultured in standard M9 
media supplemented with '°NH,Cl (1 g/l) and 
'"N-Isogro (0.5 g/l) (Isotec, OH, USA). The cells 
were then broken with a microfluidizer and the 
protein purified through a Ni-NTA affinity col- 
umn (Qiagen, CA, USA) in buffer (50 mM sodium 
phosphate, 150 mM NaCl, pH 7.4) containing 
7 M urea. The protein was then allowed to refold 
by gradually lowering the denaturant concentra- 
tion through dialysis in liquid chromatography 
buffer (50 mM sodium phosphate, 150 mM NaCl, 
1mM EDTA, 0.01% NaN3, pH 7.4). Renatured 
protein was loaded onto an AKTA-EXPLORER 
fast performance liquid chromatography (FPLC) 
system equipped with a HiLoad 16/60 Superdex 
75 column (Amersham Pharmacia Biotech, 


Sweden). Complete Protease Inhibitor cocktail 
(Roche, Germany) was added to the purified 
protein. Protein concentration was determined 
with the Bio-Rad Protein Assay kit as per instruc- 
tions from the manufacturer (Bio-Rad, CA, 
USA). The correct molecular weights of the 
expressed proteins were confirmed by mass spec- 
troscopy. 


Analytical gel-filtration chromatography 


The experiments were conducted using a FPLC 
System (Pharmacia Biotech, Sweden) with a Hi- 
Load 16/60 Superdex 75 (prep grade) column at an 
elution rate of 1 ml/min. The molecular weights of 
the proteins were estimated from the elution 
profile calibrated with the LMW Gel Filtration 
Calibration Kit (Amersham, UK). 
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Chemical cross-linking 


The homo-bifunctional amine cross-linker disuccin- 
imidyl suberate was purchased from Sigma-Aldrich 
(MO, USA) and was dissolved in N,N-dimethylfor- 
mamide (DMF) to a concentration of 25 mg/ml. 
Reactions were carried out in a final protein concen- 
tration of 0.35 mM and a final disuccinimidyl 
suberate concentration of 5mM. Mock reactions 
were set up as controls which contained only the 
protein solution and DMF without disuccinimidyl 
suberate. The reaction mixtures in standard buffer 
were allowed to react for 1h at 4°C prior to 
quenching with 100 mM glycine (final concentra- 
tion). The results were visualized on SDS-PhastGel 
minigels (Pharmacia Biotech, Sweden). 


Sedimentation velocity analysis 


Sedimentation velocity studies were carried out 
with a Beckman-Coulter XL-A analytical ultra- 
centrifuge with an An60T1 rotor at 20 °C and 
40,000 rpm. Protein samples were diluted to 0.40— 
0.75 mg/ml and loaded into standard double 
sector cells with aluminum or Epon charcoal-filled 
centerpieces. The UV absorption of the cells was 
scanned at 280 nm in continuous mode every 
10 min for a period of 5 h. The data were analyzed 
with Sedfit version 8.9d. Collections of 10—15 
radial scans were used for analysis, and 200 
sedimentation coefficients between 2 and 10 S 
were employed in calculating the c(S) distribution. 
The positions of the meniscus and cell bottom 
were determined by visual inspection, and then 
refined in the final fit. The partial specific volumes 
for N45—181, N245—365 and N45—365 were calcu- 
lated from the amino acid compositions to be 
0.7192, 0.7244 and 0.7198 ml/g, respectively. The 
solvent density and viscosity were calculated with 
Sednterp version 1.08. All samples were visually 
checked for clarity after ultracentrifugation, and 
no indication of precipitation was found. 


NMR Spectroscopy 


'"N-labeled protein samples were extensively 
exchanged with NMR buffer (100 mM sodium 
phosphate buffer, pH 6.0, containing 50 mM 
NaCl, 1 mM EDTA, | mM 2,2-dimethyl-2-sila- 
pentane-S-sulfonate, 0.01% NaN3, 10% DO and 
Complete Protease Inhibitor cocktail) using an 


Amicon-15 concentrator (Amicon, MA, USA). 
The final concentrations of the samples were 
between 0.2 and 3 mM, depending on the solubility 
of the different fragments. All the NMR data were 
acquired at 27 and 30 °C on 500, 600 or 800 MHz 
Bruker AVANCE spectrometers equipped with a 
triple resonance ('H, '*C and '°N) TXI probe with 
an actively shielded Z-gradient. Experimental 
parameters were set as described previously [25, 
26]. CLEANEX-PM spectra, which only show 
resonances exchanging rapidly with the solvent 
(Kex > 2 Hz), were obtained as described [27, 28]. 
Data were processed with the XWINNMR suite 
and AURELIA software (Bruker, Germany) on 
SGI workstations. The 'H chemical shift was 
referenced to 2,2-dimethyl-2-silapentane-5-sulfo- 
nate at 0 ppm. The '°N was referenced using the 
consensus ratio = of 0.101329118 for '°N/'H [29]. 


Results 


SARS-CoV N protein contains two independent 
structural domains 


A series of N protein fragments spanning different 
regions were constructed based on the PONDR 
prediction (Figure 1). We used a series of '°N- 
HSQC spectra of these fragments to define the 
position of the structural domains of SARS-CoV N 
protein (Figure 2). NMR chemical shifts of amide 
resonances are sensitive to structural changes and 
the pattern of '°N-HSQC spectrum has been 
commonly used to monitor order—disorder of 
proteins [30]. Well-dispersed spectra are indicative 
of structured protein whilst congested spectra 
having resonances clustered around a small region 
of 8.3+0.5 ppm in the proton dimension are 
disordered. We observed that the resonances from 
residues N45—181 have good chemical shift disper- 
sion (Figure 2a), indicating that the fragment has a 
structured character. The spectrum of N1-181 isa 
Superposition of well-dispersed resonances and a 
cluster of overlapping resonances around 
8.3+0.4 ppm (Figure 2b). Comparing the spectra 
of N1I-181 and N45—-181 revealed that all reso- 
nances belonging to N45—181 were present in the 
spectrum of NI—181 with no change in resonance 
position. These results indicate that the N-terminal 
flanking region between amino acids 1—44 does not 
affect the structure of the N45—181 domain. 
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Figure 2. '°N-HSQC spectra of the SARS-CoV N protein fragments. (a): u -'"N-N45-181. (b): u -!°N-NI-181. (c) u -'°N-N248- 
365. (d): u-CH, '°N)-N248-422. (e): Overlay spectrum of u -'°N-N45-181 and u -'°N-N248-365. (f): u-(H, '°N)-N45-365. All 
spectra were obtained on a Bruker AVANCE 800 MHz spectrometer at 27°C. NMR sample contains 0.2-1 mM protein in 
10 mM sodium phosphate buffer, 50 mM NaCl, 1 mM EDTA, 1 mM 2,2-dimethyl-2-silapentane-5-sulfonate, 0.01% NaN3, pH 6.0 
in 10% DO. The spectra shown at the bottom (b, d and f) are almost identical to those shown on the top (a, c and e) except that 
the bottom three spectra contain additional resonances in the 7.5—9 ppm ('H dimension) region. These resonances arise from the 


additional disordered residues in the longer protein fragments. 


To assess the structure of the C-terminal region 
several C-terminal fragments were prepared for 
the collection of '"N-HSQC spectra. We found 
that the resonances from N248—365 are well- 
dispersed (Figure 2c), suggesting that N248—365 
forms an ordered structure. To define the 
structural boundaries we constructed fragments 
containing N- and C-terminal extensions. 
Figure 2d shows the '"N-HSQC spectrum of 
uniformly '°N-labeled N248-422 sample. 
Comparing the spectrum of N248—422 with that 
of N248-365 (Figure 2c) we found that all 
resonances due to N248—365 can be identified in 
Figure 2d. These results indicate that residues 
from 365 to the C-terminal do not affect the 
structure of N248—365. Shortening the fragment to 
span amino acids 274-365 changes the '°N-HSQC 
resonance pattern, which indicates that the 248— 
273 region is important for structure stabilization 
of this domain (data not shown). 

To explore the structure of the region between 
residues 182—247 and their effect on the structure 
of N45-181 and N248—365, we constructed the 
fragment N45-—365 which contains the two struc- 


tured domains and the inter-domain residues. 
Comparing the '"N-HSQC spectrum of N45—365 
(Figure 2f) to that in Figure 2e, which is the 
overlay of the spectra from N45-—181 (Figure 2a) 
and N248—365 (Figure 2c), we observed that the 
resonances from N45—181 and N248—365 overlap 
perfectly with the corresponding resonances from 
N45—365, indicating that the structures of the two 
domains, N45—181 and N248—365, are not altered 
in N45—365. The lack of resonance perturbation 
when the two domains are linked together suggests 
that interaction between these two domains 1s 
weak, if they interact at all. Our results conclude 
that SARS-CoV N protein contains two indepen- 
dent structural domains located at a.a. 45-181 and 
248-365. These results are consistent with 
PONDR prediction. 


SARS-CoV N protein contains three intrinsically 
disordered regions 


PONDR predicts three intrinsically disordered 
regions in SARS-CoV N protein located at the 
N-terminus, the C-terminus and between the two 
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ordered regions (Figure 1b). We also observed 
additional resonances clustered around 
8.30.5 ppm in the proton dimension whenever 
the fragment was extended beyond the two 
structural domains (Figure 2). To test whether 
the residues beyond the structural domains are 
truly disordered, we employed the CLEANEX- 
PM experiment to identify solvent-accessible 
resonances [27]. The '"N-HSQC spectrum ob- 
tained with CLEANEX-PM pulse sequence con- 
tains only resonances from _ solvent-exposed 
amide groups. When we compared the CLEA- 
NEX-PM spectrum of NI-181 (Figure 3b) with 
that of N45—181 (Figure 3a), we observed 40 
resonances that only appeared in N1-—181 but not 
in N45-181. This number agrees with that 
expected for the N-terminal region (5 prolines), 
indicating that all amide protons in the N- 
terminus of SARS-CoV N protein are exposed 
to the solvent. We counted 39 additional peaks in 
the CLEANEX-PM spectrum of N248—422 (Fig- 
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(Figure 3c) (51 expected since there are 6 pro- 
lines), suggesting that the majority of the C-ter- 
minal residues are also solvent-exposed. When we 
compared the CLEANEX-PM spectra of N45— 
181 (Figure 3a), N248—365 (Figure 3c) and N45— 
365 (Figure 3f), we observed the extra resonances 
representing the region between residues 182-247. 
A total of 27 additional peaks can be resolved, 
compared to 64 expected (2 prolines), indicating 
that about half of the linker region between 
residues 182—247 is exposed to the solvent. It 
should be noted here that due to resonance 
overlapping the numbers counted should be 
viewed as a lower limit for the number of 
solvent-exposed residues. Nevertheless we can 
conclude that all N-terminal residues are solvent 
exposed whilst most of the residues in the C- 
terminus and in the linker region between the 
two structural domains are exposed to the 
solvent as well. In conjunction with the observa- 
tion that all additional resonances are observed 
in between 8.3+0.5 ppm in the proton dimension 
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Figure 3. '"N-edited CLEANEX-PM spectra of (a) u-'°N-N45-181, (b) u-!°N-NI-181, (c) u-'!°N-N248-365, (d) u-CH, '°N)-N248- 
422, (e) u-'°N-N45-181 and u-!°N-N248-365 overlaid on each other and (F) u-(-H, '°N)-N45-365. Spectra were obtained on a 
Bruker AVANCE 800 MHz spectrometer at pH 6.0 and 27 °C. The extra resonances are mostly clustered between 7.5 and 9 ppm 
in the 'H-dimension. The numbers of resonances are much larger in the spectra (b), (d), and (f) that contain the disordered 
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Figure 4. (a) Analytical gel-filtration chromatography of SARS-CoV N protein fragments. Fragments employed for obtaining the 
traces are indicated. (b) SDS-PAGE results of N45—181 (Lanes 1 and 2) and N248—365 (Lanes 3 and 4). M: molecular weight mar- 
ker. Lanes 1 and 3 are mock reactions without disuccinimidyl suberate. The corresponding gel traces of N248—365 and N45-181 
after reacting with disuccinimidyl suberate for 1 h at 4 °C are shown on lanes 2 and 4, respectively. (c) Sedimentation velocity 
studies of SARS-CoV N protein fragments. The distribution of the sedimentation coefficient (top) and molecular mass (bottom) of 
SARS-CoV N protein fragments N45—181, N248—365 and N45-—365. (d) A model of the overall structure of SARS-CoV N protein. 
The two solids represent the two structural domains, the RNA-binding domain (RBD) and the dimerization domain (DD). The 


wavy lines represent disordered segments. 


and PONDR results, we conclude that amino 
acids 1-44, 182-247 and 366-422 are disordered. 
The long disordered linker between the two 
structural domains is consistent with the obser- 
vation that there is little interaction between the 
two domains. However, the number of counted 
peaks in the CLEANEX-PM spectra of the C- 
terminus and the linker region are less than that 
expected, so it is likely that parts of these regions 
are solvent-protected, possibly through the for- 
mation of transient structures. Attempt to obtain 
a spectrum of the linker region alone was 
unsuccessful due to the extremely poor protein 
expression of the clone harboring the linker 
sequence. 


The C-terminal structural domain is sufficient 
for dimerization 


N45—181 has been identified as an RNA-binding 
domain. The function of the N248—365 1s not clear, 
but many reports have identified the C-terminal 
half of SARS-CoV N protein to be involved in 
oligomerization [14, 15]. To test this possibility, we 
have applied analytical gel-filtration chromatogra- 
phy, chemical cross-linking and analytical ultra- 
centrifugation to assay the self-association 
property of the N protein fragments. As shown in 
Figure 4a, N45—-181 elutes out at a molecu- 
lar weight of 18 kDa and N248-365 elutes out as 
a 28-kDa molecule, suggesting that N45—181 exists 
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Figure 5. Order—disorder prediction of coronavirus N proteins through the PONDR server. Swiss-Prot/TrEMBL accession codes 
are included in parentheses. Hatched regions represent disordered segments. HCoV OC43: Human coronavirus strain OC43 
(P33469); BCoV: Bovine coronavirus strain Quebec (P59712); PHEV: Porcine hemagglutinating encephalomyelitis virus (Q8BB23); 
MHV-1: Mouse hepatitis virus (P18446); IBV: Avian infectious bronchitis virus (P32923); TGEV: Porcine transmissible gastroen- 
teritis virus (P05991); FCoV: Feline coronavirus (012298); HCoV 229E: Human coronavirus strain 229E (P15130). All coronavirus 
N proteins in this study share the same order—disorder profile. 


as a monomer and N248-—365 exists as a dimer. The N45—365 also exists as a dimer. Furthermore, when 
self-association between the two N248—365 mono- N45—181 sample was mixed with N248—365 sam- 
mers is very strong, since we could not detect any ple two peaks at 18 and 28 kDa were observed 
monomeric fraction. Similarly, N45—365 eluted out in the elution profile, demonstrating that the two 


at molecular weight of ~70 kDa, suggesting that fragments do not interact with each other. 
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Figure 6. Predicted secondary structure alignment of the putative structural domains of coronavirus N proteins. Virus denotations are 
the same as in Figure 5. Residue numbers are listed. H: «-helix; E: extended f-strand. (a) Alignment of the N-terminal domain. The three 
conserved f-strands are enclosed. (b) Alignment of the C-terminal domain. Conserved secondary structure elements are enclosed. 
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Cross-linking experiments shown in Figure 4b 
detected the presence of only monomer for N45— 
181 and both monomer and dimer for N248—365. 

The quaternary structures of N45—181, N248— 
365 and N45-365 fragments were further 
examined by analytical ultracentrifugation. Only 
one major peak was detected for each of these 
three protein fragments, indicating that they are 
structurally homogeneous in solution. The results 
of data analysis with Sedfit version 8.9d showed 
that protein fragments N45—181, N248—365 and 
N45-—365 sediment at 1.4 S, 2.6 S and 3.7 S 
(Figure 4c), corresponding to a molecular mass of 
10, 36 and 68 kDa, respectively. These results 
confirmed that N45—181, N248—365 and N45—365 
exist as a monomer, dimer and dimer, respectively, 
in agreement with the results of gel-filtration 
chromatography and chemical cross-linking. Tak- 
ing together all three results indicate that N45—181 
exists as a monomer and N248—365 as a dimer. 
The fact that dimerization occurs through a 
structural domain strongly suggest that the process 
is dependent on the structure. A model of the 
SARS-CoV N protein interaction based on our 
current results is shown in Figure 4d. It 1s inter- 
esting to note that we did not observe the 
formation of higher-order multimer in our studies, 
which may be important for the formation of the 
ribonucleoprotein complex within the virion. A 
possible explanation is that multimer formation 
may require additional factors, such as the pres- 
ence of RNA or other parts of the N protein that 
were not present in our samples. Also we can not 
exclude the possibility that multimers do form at 
much higher protein concentrations than the ones 
used in these studies. We suggest that the dimeric 
form represents a basic building block of the 
nucleocapsid of SARS-CoV. 


Order—disorder profiles are conserved among 
coronavirus N proteins 


Since coronavirus N proteins belong to the same 
protein family, it is probable that they share 
similar structural features. Comparison of the 
order—disorder profile of these proteins (Figure 5) 
shows that they all share the same disordered 
regions (hatched regions). There are two long 
disordered regions in the middle and at the 
C-termini of the proteins, whereas the length of 
the N-terminal disordered region shows more 


variability. Two ordered regions are located 
between the disordered regions, and their locations 
generally match those of the structural domains in 
SARS-CoV N protein. 

Disordered regions are often involved in 
biomolecular interactions. The C-terminus of 
MHV N protein, which is disordered, has been 
shown to interact with hnRNP A1 [31], whereas 
the disordered region in the middle is responsible 
for its RNA-binding activity [13, 32]. In SARS- 
CoV, the disordered region in the middle of the N 
protein has been implicated in N-protein self- 
interaction [33], interaction with the M protein [16] 
and hnRNP AI interaction [17]. These experimen- 
tal observations suggest that disordered regions of 
coronavirus N proteins are probable interaction 
sites with functional implications. 


Ordered regions of coronavirus n proteins share 
similar secondary structure profiles 


Secondary structure alignment of coronavirus N 
protein sequences based on the two structural 
domains of SARS-CoV N protein show that 
they share very similar secondary structure 
profiles (Figure 6). The N-terminal domain has 
three conserved f strands which have been 
implicated in RNA binding in SARS-CoV [18]. 
The C-terminal domain is also mostly conserved 
in terms of secondary structure position within 
the sequence. The extensive secondary structure 
and high similarity suggests that the two struc- 
tural domains observed in SARS-CoV N protein 
also exist in the N proteins of other coronavi- 
ruses. 

The results from the order—disorder prediction 
and secondary structure prediction coupled with 
sequence alignment suggest that coronavirus N 
proteins all share the same modular organization. 
The two structural domains are connected by a 
disordered linker and capped by disordered N- 
terminal head and C-terminal tail. 


Discussion 


Role of the structural domains of SARS-CoV N 
protein 


The two structural domains of SARS-CoV N 
protein carry out two distinct functions. The 


N-terminal domain is able to bind RNA, whereas 
the C-terminal domain acts as a dimerization 
domain. The ability of the N-terminal domain to 
bind RNA is closely related to its structure. 
Although the structure of the C-terminal domain 
has not been determined, we suggest that dimer- 
ization is also structure-dependent. A number of 
experimental observations support our hypothesis: 
First, it has been found that oligomer dissociation 
and protein unfolding of SARS-CoV N protein 
occur simultaneously [34]; second, most self-inter- 
action studies have mapped the oligomerization 
domain to regions containing the structural 
domain [14, 15]. The structural domains may also 
serve additional functions. For example, a putative 
loop between W302 and P310 in the C-terminal 
domain has been suggested to bind to cyclophilin 
A [35]. These additional functions may also be 
dependent on the structure of the protein. 

Although the two structural domains do not 
interact with each other, we cannot discount the 
possibility that the two domains could act in 
concert to carry out important biological func- 
tions. The long flexible linker between the two 
domains provides enough freedom to make this 
scenario possible. Previously, the lack of informa- 
tion on structural organization precluded the 
study of multiple-domain interactions. Now our 
findings provide a structural framework to 
perform such studies. 


The flexible linker as an interaction hotspot 


The flexible linker between the two structural 
domains is largely disordered. This disordered 
region may enable transient interactions with 
several structurally distinct partners. It has been 
shown that the M protein of SARS-CoV binds to 
this region between a.a. 168—208 [16]. Interest- 
ingly, human cellular hnRNP AI has also been 
shown to bind to almost the same region between 
a.a. 161-210 [17]. The disordered state of this 
region potentially allows it to interact with differ- 
ent partners depending on context, e.g. with the M 
protein during virus assembly and with hnRNP Al 
during host cell infection. The exact mechanism by 
which this occurs 1s not known, but it could 
involve different induced folding pathways, which 
has been shown to occur in other disordered 
proteins [23, 36, 37]. 
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The same phenomenon is observed in other 
coronavirus N proteins. In mouse hepatitis virus 
(MHV), the region corresponding to the flexible 
linker in its N protein is involved in RNA binding 
[13, 32]. The same region has also been shown to 
bind murine hnRNP AI in infected cells [31]. It 
seems that the coronavirus N proteins share the 
common theme of using the flexible linker as an 
interaction “hotspot”, and use characteristics of 
disordered regions to achieve multiple functions 
within a limited sequence length. 


Disordered regions are potential phosphorylation 
sites 


Phosphorylation is one of the most important 
regulatory post-translational modification in pro- 
teins. SARS-CoV N protein has been shown to get 
serine-phosphorylated by multiple kinases and 
phosphorylation is proposed to be a _ possible 
mechanism for nucleocytoplasmic shuttling of the 
N_ protein [38]. Disordered regions represent 
potential sites for phosphorylation. The flexible 
linker of SARS-CoV N protein contains an SR- 
rich region, which is targeted by a number of 
kinases [39]. In fact, this region can be phosphor- 
ylated in vitro (Dr. W.-Y. Tarn, personal commu- 
nication). Recent in silico prediction suggested that 
most of the potential phosphorylation sites fall in 
the disordered regions, although the exact phos- 
phorylations sites have not been identified exper- 
imentally [38]. Although the exact role of 
phosphorylation has not been elucidated, it could 
be related to regulate functions such as RNA- 
binding and localization within the host cell. 

The phosphorylation patterns of other corona- 
virus N proteins which have been studied also fall 
in the disordered regions. In avian infections 
bronchitis virus (IBV), the phosphorylation sites 
of the N protein have been mapped to a.a. 186—198 
and 367-394 [40]. These two regions are all located 
in the disordered region as predicted by PONDR 
(Figure 5). Phosphorylation of transmissible gas- 
troenteritis virus (TGEV) N protein has also been 
mapped to residues 9, 156, 254 and 256, which are 
at or close to the disordered regions [41]. Phos- 
phorylation in disordered regions of structural 
proteins is also observed in other virus families, 
such as in Paramyxovirinae [42]. Coronavirus N 
proteins seem to employ a widespread property to 
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allow for modification. Whether or not such 
modification affects the folding or structural prop- 
erties of the protein and how these properties 
affect its function remain to be determined. 


Implications for structural and functional studies 
of coronaviral N proteins 


Identification of the disordered regions of SARS- 
CoV N protein provides a blueprint for structural 
studies of the protein. The structural domains are 
logical candidates for structural determination 
through X-ray crystallography or solution NMR 
studies. However, structure determination of the 
full-length protein is hindered by the disordered 
regions, which often interfere with crystallization 
[43]. The large size of the dimeric protein (ca. 
90 kDa) also makes full-length structure determi- 
nation through NMR extremely difficult due to T2 
issues. The fact that the two structural domains do 
not interact provides a handle to solve this 
problem. The two structural domains can be 
solved independently and still provide fair repre- 
sentation of the full-length protein. 

The modular organization of SARS-CoV N 
protein is shared among other coronavirus. The 
relative positions of the two structural domains are 
fairly conserved in all coronavirus N proteins, 
making them excellent targets for comparative 
structural studies. The structures of the N-terminal 
domains would be of special interest since in 
SARS-CoV it has been identified as an RNA- 
binding domain, whereas in other coronaviruses 
the exact function is not yet known. Of special 
note is the RNA-binding domain of MHV, which 
has been mapped to the flexible linker region 
instead of the N-terminal structural domain. At 
present the molecular mechanism involving N 
protein/RNA interaction is still not fully under- 
stood and the RNA binding site(s) have not been 
unequivocally defined. It is possible that the N- 
terminal structural domain folds into different 
tertiary structures and plays different roles in 
different coronavirus N proteins. It is also possible 
that the linker region may also be involved in 
RNA binding. Another interesting point that 
needs further study 1s the role of the C-terminal 
structural domain. It is not yet known whether it 
plays the same dimerization role in other corona- 
virus as in SARS-CoV, although there are hints in 
the literature [44]. 


In summary, we have the following conclu- 
sions: (1) The N protein of SARS-CoV 1s a di- 
domain protein connected by a flexible linker. The 
protein is capped by disordered N-terminal head 
and C-terminal tail. (2) The C-terminal structural 
domain 1s sufficient for dimerization, implying a 
structural role in the process. (3) Based on findings 
by other groups and our structural data, disor- 
dered regions of SARS-CoV N protein are poten- 
tially important interaction sites with functional 
implications. However, the exact roles of the 
disordered regions are yet to be defined. (4) The 
modular organization of SARS-CoV N protein is 
likely shared by the N proteins of other corona- 
virus. Our conclusions open up new venues for the 
study of coronavirus N proteins on a domain 
basis, including the study of complex interactions 
involving the different domains. 
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